The HVT package is a collection of R functions to facilitate building topology preserving maps for rich multivariate data analysis. Tending towards a big data preponderance, a large number of rows. A collection of R functions for this typical workflow is organized below:
Data Compression: Vector quantization (VQ), HVQ (hierarchical vector quantization) using means or medians. This step compresses the rows (long data frame) using a compression objective.
Data Projection: Dimension projection of the compressed cells to 1D,2D or Interactive surface plot with the Sammons Non-linear Algorithm. This step creates topology preserving map (also called as embedding) coordinates into the desired output dimension.
Tessellation: Create cells required for object visualization using the Voronoi Tessellation method, package includes heatmap plots for hierarchical Voronoi tessellations (HVT). This step enables data insights, visualization, and interaction with the topology preserving map. Useful for semi-supervised tasks.
Scoring: Scoring new data sets and recording their assignment using the map objects from the above steps, in a sequence of maps if required.
Dynamic Analysis A collection of functions designed to understand and visually represent the movement of data over time within a dynamic system, with the ability to forecast the next cell (t+1) by examining its underlying flow pattern.
The Lorenz attractor is a three-dimensional figure that is generated by a set of differential equations that model a simple chaotic dynamic system of convective flow. Lorenz Attractor arises from a simplified set of equations that describe the behavior of a system involving three variables. These variables represent the state of the system at any given time and are typically denoted by (x, y, z). The equations are as follows:
\[ dx/dt = σ*(y-x) \] \[ dy/dt = x*(r -z)-y \] \[ dz/dt = x*y-β*z \] where dx/dt, dy/dt, and dz/dt represent the rates of change of x, y, and z respectively over time (t). σ, r, and β are constant parameters of the system, with σ(σ = 10) controlling the rate of convection, r(r=28) controlling the difference in temperature between the convective and stable regions, and β(β = 8/3) representing the ratio of the width to the height of the convective layer. When these equations are plotted in three-dimensional space, they produce a chaotic trajectory that never repeats. The Lorenz attractor exhibits sensitive dependence on initial conditions, meaning even small differences in the initial conditions can lead to drastically different trajectories over time. This sensitivity to initial conditions is a defining characteristic of chaotic systems.
In this notebook, we will use the
Lorenz Attractor Dataset. This dataset contains 200
thousand observations and 5 columns. The dataset can be downloaded from
here
The dataset includes the following columns:
Here is the guide to install the HVT package. This helps user to install the most recent version of the HVT package.
###direct installation###
#install.packages("HVT")
#or
###git repo installation###
#library(devtools)
#devtools::install_github(repo = "Mu-Sigma/HVT")NOTE: At the time documenting this vignette, the updated changes were not still in CRAN, hence we are sourcing the scripts from the R folder directly to the session environment.
# Sourcing required code scripts for HVT
script_dir <- "../R"
r_files <- list.files(script_dir, pattern = "\\.R$", full.names = TRUE)
invisible(lapply(r_files, function(file) { source(file, echo = FALSE); }))Here, we load the data. Let’s explore the Lorenz Attractor Dataset. For the sake of brevity we are displaying only the first ten rows.
dataset <- read.csv("./sample_dataset/lorenze_attractor.csv")
dataset <- dataset %>% dplyr::select(X,Y,Z,U,t)
dataset$t <- round(dataset$t, 5)
Table(dataset, limit = 10)| X | Y | Z | U | t |
|---|---|---|---|---|
| 0.0000000 | 1.0000000 | 20.00000 | 0.0000 | 0.00000 |
| 0.0024966 | 0.9997525 | 19.98669 | 0.0005 | 0.00025 |
| 0.0049863 | 0.9995101 | 19.97337 | 0.0010 | 0.00050 |
| 0.0074692 | 0.9992728 | 19.96006 | 0.0015 | 0.00075 |
| 0.0099454 | 0.9990405 | 19.94676 | 0.0020 | 0.00100 |
| 0.0124147 | 0.9988133 | 19.93347 | 0.0025 | 0.00125 |
| 0.0148774 | 0.9985912 | 19.92018 | 0.0030 | 0.00150 |
| 0.0173333 | 0.9983741 | 19.90691 | 0.0035 | 0.00175 |
| 0.0197826 | 0.9981621 | 19.89365 | 0.0040 | 0.00200 |
| 0.0222253 | 0.9979552 | 19.88040 | 0.0045 | 0.00225 |
Now let’s try to visualize the Lorenz attractor (overlapping spirals) in 3D Space.
data_3d <- dataset[sample(1:nrow(dataset), 1000), ]
plot_3d <- plotly::plot_ly(data_3d, x= ~X, y= ~Y, z = ~Z) %>% add_markers( marker = list(
size = 2,
symbol = "circle",
color = ~Z,
colorscale = "Bluered",
colorbar = (list(title = 'Z'))))
plot_3dFigure 1: Lorenz attractor in 3D space
Now let’s have a look at structure of the Lorenz Attractor dataset.
str(dataset)
#> 'data.frame': 200000 obs. of 5 variables:
#> $ X: num 0 0.0025 0.00499 0.00747 0.00995 ...
#> $ Y: num 1 1 1 0.999 0.999 ...
#> $ Z: num 20 20 20 20 19.9 ...
#> $ U: num 0 0.0005 0.001 0.0015 0.002 ...
#> $ t: num 0 0.00025 0.0005 0.00075 0.001 0.00125 0.0015 0.00175 0.002 0.00225 ...Data distribution
This section displays four objects.
Variable Histograms: The histogram distribution of all the variables in the dataset.
Box Plots: Box plots for each numeric column in the dataset across panels. These plots will display the median and Inter quartile Range of each column at a panel level.
Correlation Matrix: This calculates the pearson correlation which is a bivariate correlation value measuring the linear correlation between two numeric columns. The output plot is shown as a matrix.
Summary EDA: The table provides descriptive statistics for all the variables in the dataset.
It uses an inbuilt function called edaPlots to display
the above mentioned four objects.
edaPlots(dataset, time_series = TRUE, time_column = 't')| variable | min | 1st Quartile | median | mean | sd | 3rd Quartile | max | hist | n_row | n_missing |
|---|---|---|---|---|---|---|---|---|---|---|
| X | -18.0202 | -3.7356 | 0.8798 | 0.7083 | 7.8247 | 5.8663 | 16.7554 | ▂▃▇▅▃ | 2e+05 | 0 |
| Y | -24.2165 | -3.4265 | 0.7270 | 0.6957 | 9.0070 | 5.4724 | 21.8814 | ▁▂▇▃▂ | 2e+05 | 0 |
| Z | 5.6491 | 15.8927 | 21.6277 | 23.2424 | 8.8526 | 30.6142 | 44.7478 | ▃▇▅▅▂ | 2e+05 | 0 |
| U | -10.0000 | -3.9458 | 3.1532 | 1.8390 | 6.6585 | 8.1096 | 10.0000 | ▅▃▃▃▇ | 2e+05 | 0 |
| t | 0.0000 | 12.5000 | 25.0000 | 25.0000 | 14.4339 | 37.5000 | 50.0000 | ▇▇▇▇▇ | 2e+05 | 0 |
Train - Test Split
Let us split the dataset into train and test. We will orderly select 80% of the data as train and remaining as test.
noOfPoints <- dim(dataset)[1]
trainLength <- as.integer(noOfPoints * 0.8)
trainDataset <- dataset[1:trainLength,]
testDataset <- dataset[(trainLength+1):noOfPoints,]
rownames(testDataset) <- NULLLet’s have a look at the Training dataset containing 160,000 data points. For the sake of brevity we are displaying first 10 rows.
Table(trainDataset, limit = 10)| X | Y | Z | U | t |
|---|---|---|---|---|
| 0.0000000 | 1.0000000 | 20.00000 | 0.0000 | 0.00000 |
| 0.0024966 | 0.9997525 | 19.98669 | 0.0005 | 0.00025 |
| 0.0049863 | 0.9995101 | 19.97337 | 0.0010 | 0.00050 |
| 0.0074692 | 0.9992728 | 19.96006 | 0.0015 | 0.00075 |
| 0.0099454 | 0.9990405 | 19.94676 | 0.0020 | 0.00100 |
| 0.0124147 | 0.9988133 | 19.93347 | 0.0025 | 0.00125 |
| 0.0148774 | 0.9985912 | 19.92018 | 0.0030 | 0.00150 |
| 0.0173333 | 0.9983741 | 19.90691 | 0.0035 | 0.00175 |
| 0.0197826 | 0.9981621 | 19.89365 | 0.0040 | 0.00200 |
| 0.0222253 | 0.9979552 | 19.88040 | 0.0045 | 0.00225 |
Now lets have a look at structure of the training dataset.
str(trainDataset)
#> 'data.frame': 160000 obs. of 5 variables:
#> $ X: num 0 0.0025 0.00499 0.00747 0.00995 ...
#> $ Y: num 1 1 1 0.999 0.999 ...
#> $ Z: num 20 20 20 20 19.9 ...
#> $ U: num 0 0.0005 0.001 0.0015 0.002 ...
#> $ t: num 0 0.00025 0.0005 0.00075 0.001 0.00125 0.0015 0.00175 0.002 0.00225 ...Data Distribution
edaPlots(trainDataset, time_series = T, time_column = 't')| variable | min | 1st Quartile | median | mean | sd | 3rd Quartile | max | hist | n_row | n_missing |
|---|---|---|---|---|---|---|---|---|---|---|
| X | -18.0202 | -3.6928 | 1.0917 | 0.8511 | 7.8501 | 6.1564 | 16.7554 | ▂▃▇▆▃ | 160000 | 0 |
| Y | -24.2165 | -3.4047 | 0.9938 | 0.8913 | 9.0368 | 5.9268 | 21.8814 | ▁▂▇▃▂ | 160000 | 0 |
| Z | 5.6491 | 16.1278 | 21.8036 | 23.3181 | 8.7778 | 30.6148 | 44.7478 | ▃▇▆▅▂ | 160000 | 0 |
| U | -10.0000 | -5.4029 | 2.8225 | 1.4319 | 6.9893 | 8.1504 | 10.0000 | ▅▂▃▃▇ | 160000 | 0 |
| t | 0.0000 | 10.0000 | 20.0000 | 20.0000 | 11.5471 | 30.0000 | 40.0000 | ▇▇▇▇▇ | 160000 | 0 |
Let’s have a look at the Testing dataset containing 40,000 data points. For the sake of brevity we are displaying first 10 rows.
Table(testDataset, limit = 10)| X | Y | Z | U | t |
|---|---|---|---|---|
| 16.05834 | 13.65882 | 39.59945 | 9.893524 | 40.00020 |
| 16.05229 | 13.60880 | 39.62776 | 9.893451 | 40.00045 |
| 16.04613 | 13.55869 | 39.65584 | 9.893379 | 40.00070 |
| 16.03985 | 13.50850 | 39.68367 | 9.893306 | 40.00095 |
| 16.03347 | 13.45823 | 39.71126 | 9.893233 | 40.00120 |
| 16.02698 | 13.40789 | 39.73861 | 9.893160 | 40.00145 |
| 16.02037 | 13.35746 | 39.76572 | 9.893087 | 40.00170 |
| 16.01366 | 13.30696 | 39.79259 | 9.893014 | 40.00195 |
| 16.00684 | 13.25639 | 39.81921 | 9.892941 | 40.00220 |
| 15.99991 | 13.20574 | 39.84559 | 9.892868 | 40.00245 |
Now lets have a look at structure of the testing dataset.
str(testDataset)
#> 'data.frame': 40000 obs. of 5 variables:
#> $ X: num 16.1 16.1 16 16 16 ...
#> $ Y: num 13.7 13.6 13.6 13.5 13.5 ...
#> $ Z: num 39.6 39.6 39.7 39.7 39.7 ...
#> $ U: num 9.89 9.89 9.89 9.89 9.89 ...
#> $ t: num 40 40 40 40 40 ...Data Distribution
edaPlots(testDataset, time_series = TRUE, time_column = 't')| variable | min | 1st Quartile | median | mean | sd | 3rd Quartile | max | hist | n_row | n_missing |
|---|---|---|---|---|---|---|---|---|---|---|
| X | -16.2606 | -3.9065 | -0.0464 | 0.1371 | 7.6957 | 4.4283 | 16.0583 | ▂▃▇▃▂ | 40000 | 0 |
| Y | -20.9897 | -3.5599 | -0.5983 | -0.0863 | 8.8440 | 3.5431 | 19.5597 | ▂▂▇▂▂ | 40000 | 0 |
| Z | 7.9115 | 15.0266 | 20.8133 | 22.9399 | 9.1395 | 30.6121 | 41.3323 | ▆▇▅▃▅ | 40000 | 0 |
| U | -5.4402 | -0.7516 | 4.1210 | 3.4677 | 4.7921 | 7.9847 | 9.8935 | ▃▃▃▅▇ | 40000 | 0 |
| t | 40.0002 | 42.5001 | 45.0001 | 45.0001 | 2.8868 | 47.5001 | 50.0000 | ▇▇▇▇▇ | 40000 | 0 |
We will use the trainHVT function to compress our
dataset while preserving essential features.
Model Parameters
NOTE: The compression takes place only for the X, Y, Z coordinates and not for U(velocity) and t(Timestamp). After training & Scoring, we merge back the U and t column with the dataset.
set.seed(240)
hvt.results <- trainHVT(
trainDataset[,-c(4:5)],
n_cells = 100,
depth = 1,
quant.err = 0.1,
normalize = TRUE,
distance_metric = "L1_Norm",
error_metric = "max",
quant_method = "kmeans"
)Let’s checkout the compression summary .
displayTable(data = hvt.results[[3]]$compression_summary,columnName = 'percentOfCellsBelowQuantizationErrorThreshold', value = 0.8, tableType = "compression")| segmentLevel | noOfCells | noOfCellsBelowQuantizationError | percentOfCellsBelowQuantizationErrorThreshold | parameters |
|---|---|---|---|---|
| 1 | 100 | 0 | 0 | n_cells: 100 quant.err: 0.1 distance_metric: L1_Norm error_metric: max quant_method: kmeans |
NOTE: Based on the provided table, it’s evident that the ‘percentOfCellsBelowQuantizationErrorThreshold’ value is zero, indicating that compression hasn’t taken place for the specified number of cells, which is 100. Typically, we would continue increasing this value until at least 80% compression occurs. However, in this vignette demonstration, we’re not doing so because the plots generated from dynamic analysis functions would become cluttered and complex, making explanations less clear.
Now, Let’s plot the Voronoi tessellation for 100 cells.
Figure 2: The Voronoi tessellation for layer 1 shown for the 100 cells in the dataset ’lorenz attractor’
Now once we have built the model, let us try to score using our testing dataset.
set.seed(240)
dataset_score <- testDataset[,-c(4:5)]
scoring_var <- scoreHVT(
dataset_score,
hvt.results,
child.level = 1)The Flow Map functions mentioned in the next section requires Cell ID from scoring output and sorted Timestamp from the dataset we used for scoring. So we merge them both to get a modified data frame that pairs cell IDs with their respective timestamps.
Let’s see which cell and level each point belongs to with the sorted Timestamp. For the sake of brevity, we will only show the first 100 rows.
scored_data <- scoring_var[["scoredPredictedData"]] %>%round(2) %>% cbind(testDataset) %>%
as.data.frame()
colnames(scored_data) <- c("Segment.Level", "Segment.Parent", "Segment.Child", "n","Cell.ID",
"Quant.Error", "pred_X", "pred_Y", "pred_Z", "centroidRadius",
"diff", "anomalyFlag", "X", "Y", "Z", "U", "t")
displayTable(data =scored_data, columnName= 'Quant.Error', value = 0.1, tableType = "summary", limit =100)| Segment.Level | Segment.Parent | Segment.Child | n | Cell.ID | Quant.Error | pred_X | pred_Y | pred_Z | centroidRadius | diff | anomalyFlag | X | Y | Z | U | t |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 27 | 1 | 4 | 0.11 | 1.94 | 1.41 | 1.85 | 0.17 | 0.07 | 0 | 16.06 | 13.66 | 39.60 | 9.89 | 40.00 |
| 1 | 1 | 27 | 1 | 4 | 0.1 | 1.94 | 1.41 | 1.86 | 0.17 | 0.07 | 0 | 16.05 | 13.61 | 39.63 | 9.89 | 40.00 |
| 1 | 1 | 27 | 1 | 4 | 0.1 | 1.94 | 1.40 | 1.86 | 0.17 | 0.07 | 0 | 16.05 | 13.56 | 39.66 | 9.89 | 40.00 |
| 1 | 1 | 27 | 1 | 4 | 0.1 | 1.93 | 1.40 | 1.86 | 0.17 | 0.08 | 0 | 16.04 | 13.51 | 39.68 | 9.89 | 40.00 |
| 1 | 1 | 27 | 1 | 4 | 0.09 | 1.93 | 1.39 | 1.87 | 0.17 | 0.08 | 0 | 16.03 | 13.46 | 39.71 | 9.89 | 40.00 |
| 1 | 1 | 27 | 1 | 4 | 0.09 | 1.93 | 1.39 | 1.87 | 0.17 | 0.08 | 0 | 16.03 | 13.41 | 39.74 | 9.89 | 40.00 |
| 1 | 1 | 27 | 1 | 4 | 0.09 | 1.93 | 1.38 | 1.87 | 0.17 | 0.09 | 0 | 16.02 | 13.36 | 39.77 | 9.89 | 40.00 |
| 1 | 1 | 27 | 1 | 4 | 0.08 | 1.93 | 1.37 | 1.88 | 0.17 | 0.09 | 0 | 16.01 | 13.31 | 39.79 | 9.89 | 40.00 |
| 1 | 1 | 27 | 1 | 4 | 0.08 | 1.93 | 1.37 | 1.88 | 0.17 | 0.09 | 0 | 16.01 | 13.26 | 39.82 | 9.89 | 40.00 |
| 1 | 1 | 27 | 1 | 4 | 0.08 | 1.93 | 1.36 | 1.88 | 0.17 | 0.10 | 0 | 16.00 | 13.21 | 39.85 | 9.89 | 40.00 |
| 1 | 1 | 27 | 1 | 4 | 0.07 | 1.93 | 1.36 | 1.89 | 0.17 | 0.10 | 0 | 15.99 | 13.16 | 39.87 | 9.89 | 40.00 |
| 1 | 1 | 27 | 1 | 4 | 0.07 | 1.93 | 1.35 | 1.89 | 0.17 | 0.10 | 0 | 15.99 | 13.10 | 39.90 | 9.89 | 40.00 |
| 1 | 1 | 27 | 1 | 4 | 0.07 | 1.93 | 1.35 | 1.89 | 0.17 | 0.10 | 0 | 15.98 | 13.05 | 39.92 | 9.89 | 40.00 |
| 1 | 1 | 27 | 1 | 4 | 0.07 | 1.93 | 1.34 | 1.89 | 0.17 | 0.11 | 0 | 15.97 | 13.00 | 39.95 | 9.89 | 40.00 |
| 1 | 1 | 27 | 1 | 4 | 0.07 | 1.93 | 1.33 | 1.90 | 0.17 | 0.11 | 0 | 15.96 | 12.95 | 39.97 | 9.89 | 40.00 |
| 1 | 1 | 27 | 1 | 4 | 0.06 | 1.92 | 1.33 | 1.90 | 0.17 | 0.11 | 0 | 15.96 | 12.90 | 40.00 | 9.89 | 40.00 |
| 1 | 1 | 27 | 1 | 4 | 0.06 | 1.92 | 1.32 | 1.90 | 0.17 | 0.11 | 0 | 15.95 | 12.85 | 40.02 | 9.89 | 40.00 |
| 1 | 1 | 27 | 1 | 4 | 0.06 | 1.92 | 1.32 | 1.91 | 0.17 | 0.11 | 0 | 15.94 | 12.80 | 40.05 | 9.89 | 40.00 |
| 1 | 1 | 27 | 1 | 4 | 0.06 | 1.92 | 1.31 | 1.91 | 0.17 | 0.11 | 0 | 15.93 | 12.75 | 40.07 | 9.89 | 40.00 |
| 1 | 1 | 27 | 1 | 4 | 0.06 | 1.92 | 1.31 | 1.91 | 0.17 | 0.11 | 0 | 15.92 | 12.70 | 40.10 | 9.89 | 40.00 |
| 1 | 1 | 27 | 1 | 4 | 0.06 | 1.92 | 1.30 | 1.91 | 0.17 | 0.11 | 0 | 15.92 | 12.64 | 40.12 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.06 | 1.92 | 1.29 | 1.92 | 0.17 | 0.12 | 0 | 15.91 | 12.59 | 40.14 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.06 | 1.92 | 1.29 | 1.92 | 0.17 | 0.12 | 0 | 15.90 | 12.54 | 40.17 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.05 | 1.92 | 1.28 | 1.92 | 0.17 | 0.12 | 0 | 15.89 | 12.49 | 40.19 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.05 | 1.91 | 1.28 | 1.92 | 0.17 | 0.12 | 0 | 15.88 | 12.44 | 40.21 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.05 | 1.91 | 1.27 | 1.93 | 0.17 | 0.12 | 0 | 15.87 | 12.39 | 40.23 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.05 | 1.91 | 1.27 | 1.93 | 0.17 | 0.12 | 0 | 15.87 | 12.33 | 40.26 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.05 | 1.91 | 1.26 | 1.93 | 0.17 | 0.12 | 0 | 15.86 | 12.28 | 40.28 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.05 | 1.91 | 1.25 | 1.93 | 0.17 | 0.13 | 0 | 15.85 | 12.23 | 40.30 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.04 | 1.91 | 1.25 | 1.94 | 0.17 | 0.13 | 0 | 15.84 | 12.18 | 40.32 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.04 | 1.91 | 1.24 | 1.94 | 0.17 | 0.13 | 0 | 15.83 | 12.13 | 40.34 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.04 | 1.91 | 1.24 | 1.94 | 0.17 | 0.13 | 0 | 15.82 | 12.08 | 40.36 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.04 | 1.91 | 1.23 | 1.94 | 0.17 | 0.13 | 0 | 15.81 | 12.02 | 40.38 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.04 | 1.90 | 1.23 | 1.95 | 0.17 | 0.13 | 0 | 15.80 | 11.97 | 40.41 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.04 | 1.90 | 1.22 | 1.95 | 0.17 | 0.13 | 0 | 15.79 | 11.92 | 40.43 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.04 | 1.90 | 1.21 | 1.95 | 0.17 | 0.14 | 0 | 15.78 | 11.87 | 40.45 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.03 | 1.90 | 1.21 | 1.95 | 0.17 | 0.14 | 0 | 15.77 | 11.82 | 40.47 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.03 | 1.90 | 1.20 | 1.96 | 0.17 | 0.14 | 0 | 15.76 | 11.76 | 40.48 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.03 | 1.90 | 1.20 | 1.96 | 0.17 | 0.14 | 0 | 15.75 | 11.71 | 40.50 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.04 | 1.90 | 1.19 | 1.96 | 0.17 | 0.14 | 0 | 15.74 | 11.66 | 40.52 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.04 | 1.90 | 1.19 | 1.96 | 0.17 | 0.13 | 0 | 15.73 | 11.61 | 40.54 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.04 | 1.89 | 1.18 | 1.96 | 0.17 | 0.13 | 0 | 15.72 | 11.55 | 40.56 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.04 | 1.89 | 1.17 | 1.97 | 0.17 | 0.13 | 0 | 15.71 | 11.50 | 40.58 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.05 | 1.89 | 1.17 | 1.97 | 0.17 | 0.13 | 0 | 15.70 | 11.45 | 40.60 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.05 | 1.89 | 1.16 | 1.97 | 0.17 | 0.12 | 0 | 15.69 | 11.40 | 40.61 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.05 | 1.89 | 1.16 | 1.97 | 0.17 | 0.12 | 0 | 15.68 | 11.35 | 40.63 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.05 | 1.89 | 1.15 | 1.97 | 0.17 | 0.12 | 0 | 15.67 | 11.29 | 40.65 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.05 | 1.89 | 1.15 | 1.98 | 0.17 | 0.12 | 0 | 15.66 | 11.24 | 40.67 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.06 | 1.88 | 1.14 | 1.98 | 0.17 | 0.12 | 0 | 15.65 | 11.19 | 40.68 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.06 | 1.88 | 1.13 | 1.98 | 0.17 | 0.11 | 0 | 15.63 | 11.14 | 40.70 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.06 | 1.88 | 1.13 | 1.98 | 0.17 | 0.11 | 0 | 15.62 | 11.08 | 40.72 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.06 | 1.88 | 1.12 | 1.98 | 0.17 | 0.11 | 0 | 15.61 | 11.03 | 40.73 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.06 | 1.88 | 1.12 | 1.99 | 0.17 | 0.11 | 0 | 15.60 | 10.98 | 40.75 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.07 | 1.88 | 1.11 | 1.99 | 0.17 | 0.11 | 0 | 15.59 | 10.93 | 40.76 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.07 | 1.88 | 1.10 | 1.99 | 0.17 | 0.10 | 0 | 15.58 | 10.87 | 40.78 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.07 | 1.87 | 1.10 | 1.99 | 0.17 | 0.10 | 0 | 15.57 | 10.82 | 40.79 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.07 | 1.87 | 1.09 | 1.99 | 0.17 | 0.10 | 0 | 15.55 | 10.77 | 40.81 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.07 | 1.87 | 1.09 | 1.99 | 0.17 | 0.10 | 0 | 15.54 | 10.72 | 40.82 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.08 | 1.87 | 1.08 | 2.00 | 0.17 | 0.10 | 0 | 15.53 | 10.66 | 40.84 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.08 | 1.87 | 1.08 | 2.00 | 0.17 | 0.09 | 0 | 15.52 | 10.61 | 40.85 | 9.89 | 40.01 |
| 1 | 1 | 27 | 1 | 4 | 0.08 | 1.87 | 1.07 | 2.00 | 0.17 | 0.09 | 0 | 15.50 | 10.56 | 40.86 | 9.89 | 40.02 |
| 1 | 1 | 27 | 1 | 4 | 0.08 | 1.87 | 1.06 | 2.00 | 0.17 | 0.09 | 0 | 15.49 | 10.51 | 40.88 | 9.89 | 40.02 |
| 1 | 1 | 27 | 1 | 4 | 0.09 | 1.86 | 1.06 | 2.00 | 0.17 | 0.09 | 0 | 15.48 | 10.45 | 40.89 | 9.89 | 40.02 |
| 1 | 1 | 27 | 1 | 4 | 0.09 | 1.86 | 1.05 | 2.00 | 0.17 | 0.08 | 0 | 15.47 | 10.40 | 40.90 | 9.89 | 40.02 |
| 1 | 1 | 27 | 1 | 4 | 0.09 | 1.86 | 1.05 | 2.00 | 0.17 | 0.08 | 0 | 15.45 | 10.35 | 40.92 | 9.89 | 40.02 |
| 1 | 1 | 27 | 1 | 4 | 0.1 | 1.86 | 1.04 | 2.01 | 0.17 | 0.08 | 0 | 15.44 | 10.30 | 40.93 | 9.89 | 40.02 |
| 1 | 1 | 27 | 1 | 4 | 0.1 | 1.86 | 1.03 | 2.01 | 0.17 | 0.07 | 0 | 15.43 | 10.24 | 40.94 | 9.89 | 40.02 |
| 1 | 1 | 27 | 1 | 4 | 0.1 | 1.86 | 1.03 | 2.01 | 0.17 | 0.07 | 0 | 15.42 | 10.19 | 40.95 | 9.89 | 40.02 |
| 1 | 1 | 27 | 1 | 4 | 0.1 | 1.85 | 1.02 | 2.01 | 0.17 | 0.07 | 0 | 15.40 | 10.14 | 40.97 | 9.89 | 40.02 |
| 1 | 1 | 27 | 1 | 4 | 0.11 | 1.85 | 1.02 | 2.01 | 0.17 | 0.06 | 0 | 15.39 | 10.09 | 40.98 | 9.89 | 40.02 |
| 1 | 1 | 27 | 1 | 4 | 0.11 | 1.85 | 1.01 | 2.01 | 0.17 | 0.06 | 0 | 15.38 | 10.03 | 40.99 | 9.89 | 40.02 |
| 1 | 1 | 27 | 1 | 4 | 0.11 | 1.85 | 1.01 | 2.01 | 0.17 | 0.06 | 0 | 15.36 | 9.98 | 41.00 | 9.89 | 40.02 |
| 1 | 1 | 27 | 1 | 4 | 0.12 | 1.85 | 1.00 | 2.02 | 0.17 | 0.06 | 0 | 15.35 | 9.93 | 41.01 | 9.89 | 40.02 |
| 1 | 1 | 27 | 1 | 4 | 0.12 | 1.85 | 0.99 | 2.02 | 0.17 | 0.05 | 0 | 15.34 | 9.88 | 41.02 | 9.89 | 40.02 |
| 1 | 1 | 27 | 1 | 4 | 0.12 | 1.84 | 0.99 | 2.02 | 0.17 | 0.05 | 0 | 15.32 | 9.82 | 41.03 | 9.89 | 40.02 |
| 1 | 1 | 27 | 1 | 4 | 0.12 | 1.84 | 0.98 | 2.02 | 0.17 | 0.05 | 0 | 15.31 | 9.77 | 41.04 | 9.89 | 40.02 |
| 1 | 1 | 27 | 1 | 4 | 0.13 | 1.84 | 0.98 | 2.02 | 0.17 | 0.04 | 0 | 15.29 | 9.72 | 41.05 | 9.89 | 40.02 |
| 1 | 1 | 27 | 1 | 4 | 0.13 | 1.84 | 0.97 | 2.02 | 0.17 | 0.04 | 0 | 15.28 | 9.67 | 41.06 | 9.89 | 40.02 |
| 1 | 1 | 66 | 1 | 8 | 0.13 | 1.84 | 0.97 | 2.02 | 0.19 | 0.06 | 0 | 15.27 | 9.62 | 41.07 | 9.89 | 40.02 |
| 1 | 1 | 66 | 1 | 8 | 0.13 | 1.83 | 0.96 | 2.02 | 0.19 | 0.06 | 0 | 15.25 | 9.56 | 41.08 | 9.89 | 40.02 |
| 1 | 1 | 66 | 1 | 8 | 0.13 | 1.83 | 0.95 | 2.02 | 0.19 | 0.06 | 0 | 15.24 | 9.51 | 41.09 | 9.89 | 40.02 |
| 1 | 1 | 66 | 1 | 8 | 0.13 | 1.83 | 0.95 | 2.03 | 0.19 | 0.06 | 0 | 15.22 | 9.46 | 41.10 | 9.89 | 40.02 |
| 1 | 1 | 66 | 1 | 8 | 0.12 | 1.83 | 0.94 | 2.03 | 0.19 | 0.07 | 0 | 15.21 | 9.41 | 41.11 | 9.89 | 40.02 |
| 1 | 1 | 66 | 1 | 8 | 0.12 | 1.83 | 0.94 | 2.03 | 0.19 | 0.07 | 0 | 15.19 | 9.35 | 41.12 | 9.89 | 40.02 |
| 1 | 1 | 66 | 1 | 8 | 0.12 | 1.83 | 0.93 | 2.03 | 0.19 | 0.07 | 0 | 15.18 | 9.30 | 41.12 | 9.89 | 40.02 |
| 1 | 1 | 66 | 1 | 8 | 0.12 | 1.82 | 0.92 | 2.03 | 0.19 | 0.07 | 0 | 15.16 | 9.25 | 41.13 | 9.89 | 40.02 |
| 1 | 1 | 66 | 1 | 8 | 0.11 | 1.82 | 0.92 | 2.03 | 0.19 | 0.08 | 0 | 15.15 | 9.20 | 41.14 | 9.89 | 40.02 |
| 1 | 1 | 66 | 1 | 8 | 0.11 | 1.82 | 0.91 | 2.03 | 0.19 | 0.08 | 0 | 15.14 | 9.15 | 41.15 | 9.89 | 40.02 |
| 1 | 1 | 66 | 1 | 8 | 0.11 | 1.82 | 0.91 | 2.03 | 0.19 | 0.08 | 0 | 15.12 | 9.09 | 41.15 | 9.89 | 40.02 |
| 1 | 1 | 66 | 1 | 8 | 0.11 | 1.82 | 0.90 | 2.03 | 0.19 | 0.08 | 0 | 15.10 | 9.04 | 41.16 | 9.89 | 40.02 |
| 1 | 1 | 66 | 1 | 8 | 0.11 | 1.81 | 0.90 | 2.03 | 0.19 | 0.08 | 0 | 15.09 | 8.99 | 41.17 | 9.89 | 40.02 |
| 1 | 1 | 66 | 1 | 8 | 0.1 | 1.81 | 0.89 | 2.03 | 0.19 | 0.09 | 0 | 15.07 | 8.94 | 41.17 | 9.89 | 40.02 |
| 1 | 1 | 66 | 1 | 8 | 0.1 | 1.81 | 0.88 | 2.03 | 0.19 | 0.09 | 0 | 15.06 | 8.89 | 41.18 | 9.89 | 40.02 |
| 1 | 1 | 66 | 1 | 8 | 0.1 | 1.81 | 0.88 | 2.04 | 0.19 | 0.09 | 0 | 15.04 | 8.83 | 41.18 | 9.89 | 40.02 |
| 1 | 1 | 66 | 1 | 8 | 0.1 | 1.81 | 0.87 | 2.04 | 0.19 | 0.09 | 0 | 15.03 | 8.78 | 41.19 | 9.89 | 40.02 |
| 1 | 1 | 66 | 1 | 8 | 0.09 | 1.80 | 0.87 | 2.04 | 0.19 | 0.10 | 0 | 15.01 | 8.73 | 41.20 | 9.89 | 40.02 |
| 1 | 1 | 66 | 1 | 8 | 0.09 | 1.80 | 0.86 | 2.04 | 0.19 | 0.10 | 0 | 15.00 | 8.68 | 41.20 | 9.89 | 40.02 |
| 1 | 1 | 66 | 1 | 8 | 0.09 | 1.80 | 0.86 | 2.04 | 0.19 | 0.10 | 0 | 14.98 | 8.63 | 41.21 | 9.89 | 40.02 |
| 1 | 1 | 66 | 1 | 8 | 0.09 | 1.80 | 0.85 | 2.04 | 0.19 | 0.10 | 0 | 14.96 | 8.58 | 41.21 | 9.89 | 40.02 |
| 1 | 1 | 66 | 1 | 8 | 0.08 | 1.80 | 0.84 | 2.04 | 0.19 | 0.11 | 0 | 14.95 | 8.52 | 41.21 | 9.89 | 40.02 |
Let’s comprehend the function plotStateTransition which
is used to create a time series plotly object.
plotStateTransition(
df,
sample_size,
line_plot,
cellid_column,
time_column
)df - A dataframe contains Cell ID
and Timestamps.
sample_size - A numeric value to
specify the sampling value which ranges between 0.1 to 1. The highest
value 1, outputs a plot with the entire dataset. Sampling of data takes
place from the last to first.
line_plot - A Logical value. If
TRUE, the output will be a timeseries plot with a line connecting the
states according to the sample_size. If FALSE, a timeseries plot but
without a line based on the sample_size will be the output.
cellid_column - A Character
specifying the column name of Cell ID from the dataframe passed to this
function.
time_column - A Character
specifying the column name of timestamp from the dataframe passed to
this function.
plotStateTransition(df = scored_data, cellid_column = "Cell.ID", time_column = "t", sample_size = 1)getTransitionProbability(
df,
cellid_column,
time_column)df - A dataframe contains Cell ID
and Timestamps.
cellid_column - A Character
specifing the column name of Cell ID from the dataframe passed to this
function.
time_column - A Character specifing
the column name of timestamp from the dataframe passed to this
function.
This function displays probability for Tplus1 states for all cells in the form of table. For the sake of brevity we are displaying the probability table for the Cell ID 1 to 5.
trans_table <- getTransitionProbability(df = scored_data, cellid_column = "Cell.ID", time_column = "t")Table(trans_table[[1]])| Current_State | Next_State | Relative_Frequency | Probability_Percentage |
|---|---|---|---|
| 1 | 1 | 297 | 0.9867 |
| 1 | 4 | 3 | 0.0100 |
| 1 | 6 | 1 | 0.0033 |
Table(trans_table[[2]])| Current_State | Next_State | Relative_Frequency | Probability_Percentage |
|---|---|---|---|
| 2 | 1 | 4 | 0.0089 |
| 2 | 2 | 445 | 0.9867 |
| 2 | 6 | 2 | 0.0044 |
Table(trans_table[[3]])| Current_State | Next_State | Relative_Frequency | Probability_Percentage |
|---|---|---|---|
| 3 | 2 | 5 | 0.0105 |
| 3 | 3 | 470 | 0.9874 |
| 3 | 10 | 1 | 0.0021 |
Table(trans_table[[4]])| Current_State | Next_State | Relative_Frequency | Probability_Percentage |
|---|---|---|---|
| 4 | 4 | 463 | 0.9872 |
| 4 | 8 | 5 | 0.0107 |
| 4 | 12 | 1 | 0.0021 |
Table(trans_table[[5]])| Current_State | Next_State | Relative_Frequency | Probability_Percentage |
|---|---|---|---|
| 5 | 3 | 4 | 0.0158 |
| 5 | 5 | 248 | 0.9802 |
| 5 | 9 | 1 | 0.0040 |